2.1 What’s the difference between Base R, RStudio IDE, and tidyverse?

This is an understandable point of confusion, so let’s clarify:

  • Base R is the coding language that we learn in this course.
  • RStudio IDE (Integrated Development Environment) is the application we use to write R code in. There are others but RStudio is the best option, although this could change in future.
  • {tidyverse} is a set of R packages that enhance base R’s utility and usability, built around the concept of Tidy Data. We’ll learn about Tidy Data in another chapter. {tidyverse} arguably changes how we write R code so fundamentally that some people argue that R+{tidyverse} should be conceptualized as a meaningfully different language with different conventions and workflows.

There is a long-standing debate about whether base R (alone) or R+{tidyverse} is better. Thankfully, I can resolve this question for you immediately: R+{tidyverse} is better. All hail the One True Language, {tidyverse}.

2.2 RStudio IDE basics

Get familiar with the different parts of the RStudio IDE user interface with this cheatsheet, which you can also download as a pdf here.

2.2.1 Source versus Visual editor

You can view a .qmd file’s raw code in the ‘Source’ viewer. The button for this appears on the top left above the code in RStudio.

Screenshot of Source editor mode:

You can also view the a live preview of the rendered file, including tables, plots, math, etc., using ‘Visual’ editor mode, although there will some simplifications compared to when you render a .html file. We’ll cover rendering in a later chapter.

Screenshot of Visual editor mode:

2.2.2 Keyboard shortcuts

Once you have learned about some of the concepts mentioned below in later chapters, it can be useful to come back to these cheatsheets to learn the keyboard shortcuts for them.

2.2.3 Particularly useful shortcuts

Windows

  • Insert Chunk: Ctrl + Alt + I
  • Insert Pipe (|>): shift + Ctrl + M
  • Multi-line typing: Alt + Mouse click-and-highlight multiple lines, then type
  • Move cursor by word instead of by character: Alt + arrows
  • Highlight words: Shift + alt + arrows
  • Fix Indentation: Mouse click-and-highlight multiple lines + Ctrl + I
  • Comment out (#) multiple Lines: Mouse click-and-highlight multiple lines, then Shift + Ctrl + C

Mac

  • Insert Chunk: Cmd + Alt + I
  • Insert Pipe (|>): shift + Cmd + M
  • Multi-line typing: Alt + Mouse click-and-highlight multiple lines, then type
  • Move cursor by word instead of by character: Option + arrows
  • Highlight words: Shift + option + arrows
  • Fix Indentation: Mouse click-and-highlight multiple lines + Cmd + I
  • Comment out (#) multiple Lines: Mouse click-and-highlight multiple lines, then Shift + Cmd + C

You can also change or set up additional keyboard shortcuts in the “Tools>Modify keyboard shortcuts” drop down menu. For example, I have modified the shortcut to switch between Source viewer vs. Visual viewer to be “Cmd + `”.

Of the above, multi-line typing is the one that reliably gets an audiable ‘whoa’ from audiences. It’s easier to see than explain:

When you get a bit more experienced with RStudio, I highly recommend you check out this blog post on shortcuts to know about more advanced features such as Function/Variable Extraction, Renaming in Scope, Code Snippets, and advanced search and find-and-replace.

2.3 Dependencies

Install libraries from CRAN with install.packages(). This only needs to be done once, not on every run of the script.

Code
install.packages(tidyverse)

In-development libraries are sometimes not on CRAN and can be installed directly from GitHub with devtools::install_github().

Code
install.packages(devtools)
devtools::install_github("ianhussey/tides") # username/repository

Necessary packages (aka dependencies) can be loaded with library(). For tidiness, these should usually all be loaded at the start of your script. Some chapters in this book load libraries only when they’re used, to clearly introduce which packages provide which functions.

Code
library(tidyverse) # umbrella package that loads dplyr/tidyr/ggplot2 and others

2.4 Accessing the help menu

For any function in a loaded package, simply type ? before the function’s name to bring up the help menu. This helps you understand the function’s purpose, its arguments, and outputs.

Code
?select

If you scroll to the bottom of a function’s help page, you’ll find an ‘Index’ hyperlink. Clicking this brings you to a list of all the package’s functions. Once you get nerdy, this can be a very useful way to discover and learn all a package’s functions.

2.5 Namespace collisons: a common source of errors

Some common packages have identically named functions with different syntax. For example, if you load both {dplyr} and {MASS}, use of the function select() can refer to either dplyr::select() or MASS::select(), and your code might not run if the other package is loaded.

You can see if you have two identically named functions loaded by opening the help menu and seeing if more than one entry appears (e.g. with ?select()).

Avoid this by loading only the packages you need. Debug errors by thinking about these common namespace collisions:

Function tidyverse Source Conflicting Package(s) Notes
filter dplyr stats stats::filter() is for signal processing (time series)
lag dplyr stats Different semantics: dplyr::lag() is simpler
select dplyr MASS MASS::select() is for stepwise regression
slice dplyr IRanges / S4Vectors Common in Bioconductor workflows
rename dplyr MASS MASS::rename() is deprecated, but may still load
summarise dplyr Hmisc Hmisc::summarize() differs in behavior
intersect dplyr base dplyr re-exports base::intersect()
union dplyr base dplyr re-exports base::union()
setdiff dplyr base dplyr re-exports base::setdiff()
count dplyr plyr Different behavior/output in plyr::count()
desc dplyr IRanges Conflicts with IRanges sorting
mutate dplyr plyr Conflicts common when plyr is loaded
arrange dplyr plyr Subtle differences; dplyr preferred

Solve this issue either by specifying which package should be used each time you use the function (e.g., dplyr::select() instead of select()) or by specifying below your library() calls which version is preferred:

Code
library(conflicted)
conflict_prefer(name = "select", winner = "dplyr")
[conflicted] Will prefer dplyr::select over any other package.

2.6 Assignment of objects

Assignment of objects is done via <- by convention.

Code
# set the variable x to be the number 5
x <- 5

# print the contents of x
x
[1] 5

Technically you can also use =, but it’s best to avoid it.

Code
# set the variable y to be the string "hello"
y = "hello"

# print the contents of y
y
[1] "hello"

It’s somewhat less well known, but you can also do “right-assignment” (->) instead of the much more common left assignment (<-).

Code
# set the variable y to be the string "really? yes."
"really? yes." -> z

# print the contents of z
z
[1] "really? yes."

2.7 Rounding: round() probably doesn’t do what you think

Did you know that R doesn’t use the rounding method most of us are taught in school, where .5 is rounded up to the next integer? Instead it uses “banker’s rounding”, which is better when you round a very large number of numbers, but worse for reporting the results of specific analyses.

This is easier to show than explain. The round() function rounds each of the numbers passed to it. What do you expect the output to be?

Code
round(c(0.5, 
        1.5, 
        2.5, 
        3.5, 
        4.5, 
        5.5))

[1] 0 2 2 4 4 6

Why is this? Because R’s round() function uses “banker’s rounding, which rounds 5s based on whether the preceding digit is odd or even. This is a good thing in many contexts like accounting, but it’s usually not what we want or expect when rounding specific statistical results for inclusion in a report or manuscript.

In most of your R scripts, you should probably instead use janitor::round_half_up() instead, which produces the round-5-upwards behavior you were probably taught in school.

Code
library(janitor)

janitor::round_half_up(c(0.5, 
                         1.5, 
                         2.5, 
                         3.5, 
                         4.5, 
                         5.5))
[1] 1 2 3 4 5 6

Another great option is roundwork::round_up(), which is a package that my PhD student Lukas Jung wrote before joining our research group.

Code
library(roundwork) 

roundwork::round_up(c(0.5, 
                      1.5, 
                      2.5, 
                      3.5, 
                      4.5, 
                      5.5))
[1] 1 2 3 4 5 6

2.8 Exercises

Edit this .qmd file to make the following changes.

2.8.1 Fix indentation / white space

Read the code in the chunk below. We will cover the functions in later chapters, you don’t need to understand it yet. Notice that the indentation or ‘white space’ is somewhat chaotic. Fix this with a keyboard shortcut: with your mouse, highlight the code the chunk below and press Ctrl + I (Windows) or Cmd + I (Mac) to fix the indentation. Notice how much easier it is to read.

You can undo this with Ctrl + z (Windows) or Cmd + z (Mac) if you want to see it before/after again.

Code
# create table
dat_processed_long %>%
  # summarize mean and SD by subscale
dplyr::group_by(subscale) %>%
  dplyr::summarize(n = dplyr::n(),
m = mean(score, na.rm = TRUE),
               sd = sd(score, na.rm = TRUE)) %>%
  # round estimates 
  dplyr::mutate(m = janitor::round_half_up(m, digits = 2),
  sd = janitor::round_half_up(sd, digits = 2)) %>%
# print nicer table
knitr::kable(align = 'r') |>
  kableExtra::kable_styling()